12 research outputs found

    Augmenting concept definition in gloss vector semantic relatedness measure using Wikipedia articles

    Get PDF
    Semantic relatedness measures are widely used in text mining and information retrieval applications. Considering these automated measures, in this research paper we attempt to improve Gloss Vector relatedness measure for more accurate estimation of relatedness between two given concepts. Generally, this measure, by constructing concepts definitions (Glosses) from a thesaurus, tries to find the angle between the concepts’ gloss vectors for the calculation of relatedness. Nonetheless, this definition construction task is challenging as thesauruses do not provide full coverage of expressive definitions for the particularly specialized concepts. By employing Wikipedia articles and other external resources, we aim at augmenting these concepts’ definitions. Applying both definition types to the biomedical domain, using MEDLINE as corpus, UMLS as the default thesaurus, and a reference standard of 68 concept pairs manually rated for relatedness, we show exploiting available resources on the Web would have positive impact on final measurement of semantic relatedness

    Improving Gloss Vector Semantic Relatedness Measure by Integrating pointwise mutual information optimizing second-order co-occurrence vectors computed from biomedical corpus and UMLS

    No full text
    Methods of semantic relatedness are essential for wide range of tasks such as information retrieval and text mining. This paper, concerned with these automated methods, attempts to improve Gloss Vector semantic relatedness measure for more reliable estimation of relatedness between two input concepts. Generally, this measure by considering frequency cut-off for big rams tries to remove low and high frequency words which usually do not end up being significant features. However, this naive cutting approach can lead to loss of valuable information. By employing point wise mutual information (PMI) as a measure of association between features, we will try to enforce the foregoing elimination step in a statistical fashion. Applying both approaches to the biomedical domain, using MEDLINE as corpus, MeSH as thesaurus, and available reference standard of 311 concept pairs manually rated for semantic relatedness, we will show that PMI for removing insignificant features is more effective approach than frequency cut-off

    Applying semantic similarity measures to enhance topic-specific web crawling

    No full text
    As the Internet grows rapidly, finding desirable information becomes a tedious and time consuming task. Topic-specific web crawlers, as utopian solutions, tackle this issue through traversing the Web and collecting information related to the topic of interest. In this regard, various methods are proposed. Nevertheless, they hardly consider desired sense of the given topic which would certainly play an important role to find relevant web pages. In this paper, we attempt to improve topic-specific web crawling by disambiguating the sense of the topic. This would avoid crawling irrelevant links interlaced with other senses of the topic. For this purpose, by considering links hypertext semantic, we employ Lin semantic similarity measure in our crawler, named LinCrawler, to distinguish topic sense-related links from the others. Moreover, we compare LinCrawler against TFCrawler which only considers frequency of terms in hypertexts. Experimental results show LinCrawler outperforms TFCrawler to collect more relevant web pages

    Gene functional similarity analysis by definition-based semantic similarity measurement of GO terms

    No full text
    The rapid growth of biomedical data annotated by Gene Ontology (GO) vocabulary demands an intelligent method of semantic similarity measurement between GO terms remarkably facilitating analysis of genes functional similarities. This paper introduces two efficient methods for measuring the semantic similarity and relatedness of GO terms. Generally, these methods by taking definitions of GO terms into consideration, address the limitations in the existing GO term similarity measurement methods. The two developed and implemented measures are, in essence, optimized and adapted versions of Gloss Vector semantic relatedness measure for semantic similarity/relatedness estimation between GO terms. After constructing optimized and similarity-adapted definition vectors (Gloss Vectors) of all the terms included in GO, the cosine of the angle between terms’ definition vectors represent the degree of similarity or relatedness for two terms. Experimental studies show that this semantic definition-based approach outperforms all existing methods in terms of the correlation with gene expression data

    Definition-based information content vectors for semantic similarity measurement

    No full text
    Ontologies, as representation of shared conceptualization for variety of specific domains, are the heart of the Semantic Web. In order to facilitate interoperability across multiple ontologies, we need an automatic mechanism to align ontologies. Therefore, many methods to measure similarity between concepts existing in two different ontologies are proposed. In this paper, we will enumerate these methods along with their shortcomings in each case. In information content (IC) based similarity measures, the process of IC computation for concepts is so challenging and in many cases with failing. We will propose our new approach that is based on concepts’ definitions. These definitions would help us to compute reliable and easy to calculate information contents for concepts. Applying these methods to the biomedical domain, using MEDLINE as corpus, International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9CM) as thesaurus, and available reference standard, we will find our method outperforms other similarity measures

    Improving multi-term topics focused crawling by introducing term Frequency-Information Content (TF-IC) measure

    Get PDF
    By rapid growth of the Internet, finding desirable information would be a challenging and time consuming task. In order to tackle this issue, focused crawlers, as the ideal solution, through mining of the Web, help us to find web pages closely relevant to the desired information. For this purpose, a variety of methods are devised and implemented. Nonetheless, the majority of these methods do not favor more informative terms in a given multi-term topic. In this paper, we propose a new measure called Term Frequency-Information Content (TF-IC) to prioritize terms in a multi-term topic accordingly. Through conducted experiments, we compare our measure against both Term Frequency-Inverse Document Frequency (TF-IDF) and Latent Semantic Indexing (LSI) measures applied in focused crawlers. Experimental results indicate superiority of our measure over TF-IDF and LSI for collecting more relevant web pages of both general and specialized multi-term topics. © 2013 IEEE

    The effect of mobile applications on English vocabulary acquisition

    No full text
    The study reported here investigates the use and effectiveness of mobile applications in English vocabulary learning. Vocabulary acquisition is an important part of language learning. The advancement in technology has greatly improved the existing setting in education world in recent years. The wide use of mobile wireless technologies also has created more opportunities to shift the traditional academic environment to mobile learning. Interactive multimedia is a great avenue for the communication and education. This research studies intermediate-level English learners' performance before and after using mobile applications that were introduced to the study group as an intervention. It examines whether multimedia courseware affects the vocabulary learning in the second language acquisition. The quantitative data revealed positive change in learners' performance and the questionnaire analysis indicated that using the applications helped enhance learning of vocabulary, confidence, class participation and that, students had a positive tendency toward the use of multimedia in education
    corecore